cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ashok1
by New Contributor II
  • 1652 Views
  • 2 replies
  • 1 kudos
  • 1652 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Ashok ch​ Hope everything is going great.Does @Ivan Tang​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more hel...

  • 1 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 4609 Views
  • 2 replies
  • 2 kudos

Resolved! Does Databricks have a maven repository to download the jars?

Using OSS jars is causing classpath issues always when running the job on Databricks. The same job works fine on EMR/on-premise. 

  • 4609 Views
  • 2 replies
  • 2 kudos
Latest Reply
mj2022
New Contributor III
  • 2 kudos

I following the https://docs.databricks.com/dev-tools/databricks-connect.html#intellij-scala-or-java to obtain spark-avro jar since databricks have it's custom from_avro method to use with kafka schema registry, But i am not able to find spark-avro j...

  • 2 kudos
1 More Replies
jay548
by New Contributor
  • 1479 Views
  • 0 replies
  • 0 kudos

ERROR yarn.ApplicationMaster: - Wrong FS s3:// expected s3a://

We migrated from HDP to Cloudera platform 7, everything works except when we try to use databricks with redshift to load the data into a redshift table. we get the following error . ERROR yarn.ApplicationMaster: User class threw exception: java.lang....

  • 1479 Views
  • 0 replies
  • 0 kudos
GeorgeP
by New Contributor II
  • 2241 Views
  • 2 replies
  • 2 kudos

Errors when querying Azure DataBricks through DBeaver on macos

Configured DBeaver to work with either databricks latest driver or simba. I can connect and see databases, schemas, tables and columns. However, when a select statement is executed 30-40 seconds go by before I get the following error message: SQL...

  • 2241 Views
  • 2 replies
  • 2 kudos
Latest Reply
sage5616
Valued Contributor
  • 2 kudos

Has this issue been resolved? @aravhish solution did not help me. Any other options?I am experiencing the exact same issue with the same configuration on a Mac. Much help would be appreciated.

  • 2 kudos
1 More Replies
amichel
by New Contributor III
  • 9190 Views
  • 3 replies
  • 4 kudos

Resolved! Recommended way to integrate MongoDB as a streaming source

Current state:Data is stored in MongoDB Atlas which is used extensively by all servicesData lake is hosted in same AWS region and connected to MongoDB over private link Requirements:Streaming pipelines that continuously ingest, transform/analyze and ...

  • 9190 Views
  • 3 replies
  • 4 kudos
Latest Reply
robwma
New Contributor III
  • 4 kudos

Another option if you'd like to use Spark as the ingestion is to use the new Spark Connector V10.0 which support Spark Structured Streaming. https://www.mongodb.com/developer/languages/python/streaming-data-apache-spark-mongodb/. If you use Kafka, th...

  • 4 kudos
2 More Replies
Lonnie
by New Contributor
  • 2356 Views
  • 0 replies
  • 0 kudos

Recommended Redshift-2-Delta Migration Path

Hello All!My team is previewing Databricks and are contemplating the steps to take to perform one-time migrations of datasets from Redshift to Delta. Based on our understandings of the tool, here are our initial thoughts:Export data from Redshift-2-S...

  • 2356 Views
  • 0 replies
  • 0 kudos
Databricks_7045
by New Contributor III
  • 4603 Views
  • 2 replies
  • 4 kudos

Resolved! Connecting Delta Tables from any Tools

Hi Team,To access SQL Tables we use tools like TOAD , SQL SERVER MANAGEMENT STUDIO (SSMS).Is there any tool to connect and access Databricks Delta tables.Please let us know.Thank you

  • 4603 Views
  • 2 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Rajesh Vinukonda​ Hope you are doing well. Thanks for sending in your question. Were you able to find a solution to your query?

  • 4 kudos
1 More Replies
Constantine
by Contributor III
  • 2542 Views
  • 1 replies
  • 5 kudos

Resolved! Delta Table created on s3 has all null values

I have data in a Spark Dataframe and I write it to an s3 location. It has some complex datatypes like structs etc. When I create the table on top on the s3 location by using CREATE TABLE IF NOT EXISTS table_name USING DELTA LOCATION 's3://.../...';Th...

  • 2542 Views
  • 1 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

@John Constantine​ ,Try to load it as DataFrame (spark.read.delta(path)) and validate what is loading,It could be easier to mount the S3 location as a folder to ensure that all data is there (dbutils or %fs to check) and that the connection is workin...

  • 5 kudos
Constantine
by Contributor III
  • 2926 Views
  • 1 replies
  • 4 kudos

Resolved! How to process a large delta table with UDF ?

I have a delta table with about 300 billion rows. Now I am performing some operations on a column using UDF and creating another columnMy code is something like thisdef my_udf(data): return pass   udf_func = udf(my_udf, StringType()) data...

  • 2926 Views
  • 1 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

That udf code will run on driver so better not use it for such a big dataset. What you need is vectorized pandas udf https://docs.databricks.com/spark/latest/spark-sql/udf-python-pandas.html

  • 4 kudos
george2020
by New Contributor II
  • 1341 Views
  • 0 replies
  • 2 kudos

Using the Databricks Repos API to bring Repo in top-level production folder to latest version

I am having an issue with Github Actions workflow using the Databricks Repos API. We want the API call in the Git Action to bring the Repo in our Databricks Repos Top-level folder to the latest version on a merge into the main branch.The Github Actio...

  • 1341 Views
  • 0 replies
  • 2 kudos
wyzer
by Contributor II
  • 4449 Views
  • 2 replies
  • 2 kudos

Resolved! Insert data into an on-premise SQL Server

Hello,Is it possible to insert data from Databricks into an on-premise SQL Server ?Thanks.

  • 4449 Views
  • 2 replies
  • 2 kudos
Latest Reply
wyzer
Contributor II
  • 2 kudos

Hello,Yes we find out how to do it by installing a JDBC connector.It works fine.Thanks.

  • 2 kudos
1 More Replies
alejandrofm
by Valued Contributor
  • 3481 Views
  • 4 replies
  • 5 kudos

Resolved! Show Vacuum operation result (files deleted) without DRY RUN

Hi, I'm runing some scheduled vacuum jobs and would like to know how many files were deleted without making all the computation twice, with and without DRY RUN, is there a way to accomplish this?Thanks!

  • 3481 Views
  • 4 replies
  • 5 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 5 kudos

We have to enable logging to capture the logs for vacuum.spark.conf.set("spark.databricks.delta.vacuum.logging.enabled","true")

  • 5 kudos
3 More Replies
Constantine
by Contributor III
  • 4733 Views
  • 1 replies
  • 2 kudos

Resolved! OPTIMIZE throws an error after doing MERGE on the table

I have a table on which I do upsert i.e. MERGE INTO table_name ...After which I run OPTIMIZE table_nameWhich throws an errorjava.util.concurrent.ExecutionException: io.delta.exceptions.ConcurrentDeleteReadException: This transaction attempted to read...

  • 4733 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

You can try to change isolation level:https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/isolation-levelIn merge is good to specify all partitions in merge conditions.It can also happen when script is running concurrently.

  • 2 kudos
wyzer
by Contributor II
  • 3965 Views
  • 2 replies
  • 4 kudos

Resolved! How to show the properties of the folders/files from DBFS ?

Hello,How to show the properties of the folders/files from DBFS ?Currently i am using this command :display(dbutils.fs.ls("dbfs:/"))But it only shows :pathnamesizeHow to show these properties ? : CreatedBy (Name)CreatedOn (Date)ModifiedBy (Name)Modi...

  • 3965 Views
  • 2 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

Only one idea is to use %sh magic command but there is no name (just root)

  • 4 kudos
1 More Replies
SettlerOfCatan
by New Contributor II
  • 6239 Views
  • 0 replies
  • 1 kudos

Access data within the blob storage without downloading

Our customer is using Azure’s blob storage service to save big files so that we can work with them using an Azure online service, like Databricks.We want to read and work with these files with a computing resource obtained by Azure directly without d...

blob-storage Azure-ML fileytypes blob
  • 6239 Views
  • 0 replies
  • 1 kudos
Labels