cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Chanu
by New Contributor II
  • 2398 Views
  • 2 replies
  • 2 kudos

Databricks JAR task type functionality

Hi, I would like to understand Databricks JAR based workflow tasks. Can I interpret JAR based runs to be something like a spark-submit on a cluster? In the logs, I was expecting to see the spark-submit --class com.xyz --num-executors 4 etc., And, the...

  • 2398 Views
  • 2 replies
  • 2 kudos
Latest Reply
Chanu
New Contributor II
  • 2 kudos

Hi, I did try using the Workflows>Jobs>CreateTask>JarTaskType>UploadedMyJAR and Class and created JobCluster and tested this task. This JAR reads some tables as input, does some transformations and output as writing some other tables. I would like t...

  • 2 kudos
1 More Replies
pasiasty2077
by New Contributor
  • 8206 Views
  • 1 replies
  • 1 kudos

Partition filter is skipped when table is used in where condition, why?

Hi,maybe someone can help me i do want to run very narrow query SELECT * FROM my_table WHERE snapshot_date IN('2023-01-06', '2023-01-07')   -- part of the physical plan: -- Location: PreparedDeltaFileIndex [dbfs:/...] -- PartitionFilters: [cast(snaps...

  • 8206 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

No hints on partition pruning afaik.The reason the partitions were not pruned is because the second query generates a completely different plan.To be able to filter the partitions, a join first has to happen. And in this case it means the table has...

  • 1 kudos
sudhanshu1
by New Contributor III
  • 3858 Views
  • 4 replies
  • 2 kudos

Resolved! DLT workflow failing to read files from AWS S3

Hi All, I am trying to read streams directly from AWS S3. I set the instance profile , but when i run the workflow it fails with below error"No AWS Credentials provided by TemporaryAWSCredentialsProvider : shaded.databricks.org.apache.hadoop.fs.s3a.C...

  • 3858 Views
  • 4 replies
  • 2 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 2 kudos

Hi @SUDHANSHU RAJ​ is UC enabled on this workspace? What is the access mode set on the cluster? Is this coming from the metastore or directly when you read from S3? Is the S3 cross-account?

  • 2 kudos
3 More Replies
alxsbn
by Contributor
  • 3028 Views
  • 2 replies
  • 2 kudos

Resolved! Autloader on CSV file didn't infer well cell with JSON data

Hello ! I playing with autoloader schema inference on a big S3 repo with +300 tables and large CSV files. I'm looking at autoloader with great attention, as it can be a great time saver on our ingestion process (data comes from a transactional DB gen...

  • 3028 Views
  • 2 replies
  • 2 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 2 kudos

PySpark by default is using \ as an escape character. You can change it to "Doc: https://docs.databricks.com/ingestion/auto-loader/options.html#csv-options

  • 2 kudos
1 More Replies
Victhor
by New Contributor III
  • 8799 Views
  • 2 replies
  • 12 kudos
  • 8799 Views
  • 2 replies
  • 12 kudos
Latest Reply
chanshing
New Contributor III
  • 12 kudos

@Kaniz Fatma​ Is that tool (dbvim) still maintained? It looks like it has been abandoned and there are a couple of unresolved issues.Are there any plans to support vim keybindings in Databricks? This is possible in many other web-based editors such a...

  • 12 kudos
1 More Replies
DeveloperAmarde
by New Contributor
  • 2355 Views
  • 1 replies
  • 0 kudos

Connection to Collibra

Hi Team,I want to connect to collibra to fetch details from Collibra.Currently we are using username and password to connect.I want to know recommended practice to connect Collibra account from databricks notebook.

  • 2355 Views
  • 1 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, Could you please know if this helps. https://marketplace.collibra.com/listings/jdbc-driver-for-databricks/

  • 0 kudos
BigMF
by New Contributor III
  • 2792 Views
  • 1 replies
  • 2 kudos

How do I maintain Azure Account Console access after Global Admin access is removed?

Hello, I don't know if I should have created a separate question or added to this one.I've read the documentation and as far as I can tell, have followed it correctly but I'm still having issues accessing the Account Console unless the user I'm loggi...

image image image
  • 2792 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi, Could you please try  https://accounts.azuredatabricks.com?accountid=(account-id) , where the account ID has to be the value in the URL? Please let us know if this helps.

  • 2 kudos
lmcglone
by New Contributor II
  • 7417 Views
  • 2 replies
  • 3 kudos

Comparing 2 dataframes and create columns from values within a dataframe

Hi,I have a dataframe that has name and companyfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()columns = ["company","name"]data = [("company1", "Jon"), ("company2", "Steve"), ("company1", "...

image
  • 7417 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

You need to join and pivotdf .join(df2, on=[df.company == df2.job_company])) .groupBy("company", "name") .pivot("job_company") .count()

  • 3 kudos
1 More Replies
shamly
by New Contributor III
  • 4351 Views
  • 4 replies
  • 2 kudos

How to replace LF and replace with ' ' in csv UTF-16 encoded?

I have tried several code and nothing worked. An extra space or line LF is going to next row in my output. All rows are ending in CRLF, but some rows end in LF and while reading the csv, it is not giving correct output. My csv have double dagger as d...

  • 4351 Views
  • 4 replies
  • 2 kudos
Latest Reply
sher
Valued Contributor II
  • 2 kudos

val df = spark.read.format("csv") .option("header",true) .option("sep","||") .load("file load") display(df)   try this

  • 2 kudos
3 More Replies
KVNARK
by Honored Contributor II
  • 3162 Views
  • 3 replies
  • 9 kudos

one of the date datatype format issue in pysaprk

if anyone has encountered this date type format - 6/15/25 12:00 AM could you mention the right formatting to be used in Pyspark.Thanks in advance!

  • 3162 Views
  • 3 replies
  • 9 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 9 kudos

Without legacy, it will also work.SELECT to_timestamp('6/15/23 12:00 AM', 'M/dd/yy h:mm a')

  • 9 kudos
2 More Replies
andrew0117
by Contributor
  • 10914 Views
  • 1 replies
  • 0 kudos

Resolved! How to read a local file using Databricks( file stored in your own computer)

without uploading the file into dbfs? Thanks!

  • 10914 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

In my opinion, it doesn't make sense, but...you can Mount SMB Azure file share on a Windows Machine https://learn.microsoft.com/en-us/azure/storage/files/storage-how-to-use-files-windows and then mount the same folder on databricks using pip install ...

  • 0 kudos
Mahesh777k
by New Contributor
  • 2919 Views
  • 2 replies
  • 2 kudos

How to delete duplicate tables?

Hi Everyone,Accidently imported duplicate tables, guide me how to delete themusing data bricks community edition  

image
  • 2919 Views
  • 2 replies
  • 2 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 2 kudos

Hi @Mahesh Babu Uppala​ You can use the following method to delete only the duplicate tables%scala val tables = spark.sql("""SHOW TABLES""").createOrReplaceTempView("tables") val temp_tables = spark.sql("""select tableName from tables where tableName...

  • 2 kudos
1 More Replies
labtech
by Valued Contributor II
  • 5399 Views
  • 4 replies
  • 18 kudos

Resolved! Limit resource when create cluster in Databricks on AWS platform

Hi team,Could you please help check on my case? I always failed at this step Thanks

image
  • 5399 Views
  • 4 replies
  • 18 kudos
Latest Reply
labtech
Valued Contributor II
  • 18 kudos

Thanks all your answer. The problem come from AWS side. Don't know why the first ticket they said that the issue didn't come from AWS

  • 18 kudos
3 More Replies
jamesw
by New Contributor II
  • 3074 Views
  • 1 replies
  • 1 kudos

Ganglia not working with custom container services

Setup:custom docker container starting from the "databricksruntime/gpu-conda:cuda11" base image layer10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)multi-node, p3.8xlarge GPU computeWhen I try to view Ganglia metrics I am met with "502 Bad Gatewa...

image.png image
  • 3074 Views
  • 1 replies
  • 1 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 1 kudos

Hi @James W​ , Ganglia is not available for custom docker containers by default. This is a known limitation. However, you can try this experimental support for ganglia in custom DCS:https://github.com/databricks/containers/tree/master/experimental/ub...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels