cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Hubert-Dudek
by Databricks MVP
  • 25130 Views
  • 23 replies
  • 36 kudos

Resolved! SparkFiles - strange behavior on Azure databricks (runtime 10)

When you use:from pyspark import SparkFiles spark.sparkContext.addFile(url)it adds file to NON dbfs /local_disk0/ but then when you want to read file:spark.read.json(SparkFiles.get("file_name"))it wants to read it from /dbfs/local_disk0/. I tried als...

  • 25130 Views
  • 23 replies
  • 36 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 36 kudos

I confirm that as @Arvind Ravish​ said adding file:/// is solving the problem.

  • 36 kudos
22 More Replies
Vegard_Stikbakk
by New Contributor II
  • 3801 Views
  • 1 replies
  • 3 kudos

Resolved! External functions on a SQL endpoint

want to create an external function using CREATE FUNCTION (External) and expose it to users of my SQL endpoint. Although this works from a SQL notebook, if I try to use the function from a SQL endpoint, I get "User defined expression is not supporte...

Screenshot 2022-03-24 at 21.32.59
  • 3801 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 3 kudos

It is separated runtime https://docs.databricks.com/sql/release-notes/index.html#channels so it seems that it is not yet supported. There is CREATE FUNCTION documentation but it seems that it is support only SQL syntax https://docs.databricks.com/sql...

  • 3 kudos
dataguy73
by New Contributor
  • 3833 Views
  • 1 replies
  • 1 kudos

Resolved! spark properties files

I am trying to migrate a spark job from an on-premises Hadoop cluster to data bricks on azure. Currently, we are keeping many values in the properties file. When executing spark-submit we pass the parameter --properties /prop.file.txt. and inside t...

  • 3833 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

I use JSON files and .conf files which reside on the data lake or in the filestore of dbfs.Then read those files using python/scala

  • 1 kudos
BasavarajAngadi
by Contributor
  • 2499 Views
  • 1 replies
  • 4 kudos

Resolved! Hi Experts : I am new to Databricks please help me on below. Question : How is delta table stored in DBFS ?

If I create delta table the table is stored in parque format in DBFS location ? and please share how the parque files supports schema evolution if i do DML operation.As per my understanding : we read data from data lake first in data frame and try to...

  • 2499 Views
  • 1 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

delta lake is parquet on steroids. The actual data is stored in parquet files, but you get a bunch of extra functionalities (time traveling, ACID, optimized writes, MERGE etc).check this page for lots of info.Delta lake does support schema evolution...

  • 4 kudos
BarakHav
by New Contributor II
  • 1873 Views
  • 0 replies
  • 3 kudos

Automatically Vacuuming a Delta Table on Databricks

Hi all,I've recently checked my bucket size on AWS and saw that it's size doesn't make much sense. I decided to vacuum each delta table with 2 weeks of retention. That shrunk the data from 30TB to around 5TB, though I was wondering, shouldn't default...

  • 1873 Views
  • 0 replies
  • 3 kudos
Gvsmao
by New Contributor III
  • 12294 Views
  • 7 replies
  • 3 kudos

Resolved! SQL Databricks - Spot VMs (Cost Optimized)

Hello! I want to ask a question please!Referring to Spot VMs with the "Cost Optimized" setting:In the case of Endpoint X-Small, which are 2 workers, if I send 10 simultaneous queries and a worker is evicted, can I have an error in any of these querie...

image
  • 12294 Views
  • 7 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Thanks for the information, I will try to figure it out for more. Keep sharing such informative post.  www.mygroundbiz.com

  • 3 kudos
6 More Replies
Krish123
by New Contributor
  • 2006 Views
  • 0 replies
  • 0 kudos

mount a Azure DL in Databricks

Hello Team,I am quite new to Databricks and I am learning PySpark and Databricks. I am trying to mount a DL Gen2 in Databricks, as part of that I had created app registration, added DL into app registration permissions, created a secret and also adde...

  • 2006 Views
  • 0 replies
  • 0 kudos
Hubert-Dudek
by Databricks MVP
  • 2014 Views
  • 0 replies
  • 24 kudos

Delta time travel - recover unconditionally delete Recovery is a great feature of the delta. Let's check with a real example of how recovery optio...

Delta time travel - recover unconditionally deleteRecovery is a great feature of the delta. Let's check with a real example of how recovery option work.Please watch my new youtube video about that topic.https://www.youtube.com/watch?v=TrUT6pvFKic 

imagen.png
  • 2014 Views
  • 0 replies
  • 24 kudos
shan_chandra
by Databricks Employee
  • 5519 Views
  • 1 replies
  • 2 kudos

Resolved! java.lang.ArithmeticException: Casting XXXXXXXXXXX to int causes overflow

My job started failing with the below error when inserting rows into a delta table. ailing with the below error when inserting rows (timestamp) to a delta table, it was working well before.java.lang.ArithmeticException: Casting XXXXXXXXXXX to int cau...

  • 5519 Views
  • 1 replies
  • 2 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 2 kudos

This is because the Integer type represents 4-byte signed integer numbers. The range of numbers is from -2147483648 to 2147483647.Kindly use double as the data type to insert the "2147483648" value in the delta table.In the below example, The second ...

  • 2 kudos
Bency
by New Contributor III
  • 3039 Views
  • 3 replies
  • 2 kudos

Invalid field schema option provided-DatabricksDeltaLakeSinkConnector

I have configured a Delta Lake Sink connector which reads from an AVRO topic and writes to the Delta lake . I have followed the docs and my config looks like below .  { "name": "dev_test_delta_connector", "config": {  "topics": "dl_test_avro",  "inp...

  • 3039 Views
  • 3 replies
  • 2 kudos
Latest Reply
Bency
New Contributor III
  • 2 kudos

@Hubert Dudek​ , Should I be configuring anything with respect to schema in the connector config ? Because I did successfully stage some data from another topic of a different format(JSON_SR) into delta lake table , but its with AVRO topic that I ge...

  • 2 kudos
2 More Replies
User16826992666
by Databricks Employee
  • 3965 Views
  • 2 replies
  • 1 kudos

Resolved! As an admin of a Databricks SQL environment, can I cancel long running queries?

I don't want one long or poorly written query to block my entire SQL endpoint for everyone else. Do I have the ability to kill specific queries?

  • 3965 Views
  • 2 replies
  • 1 kudos
Latest Reply
DevB
New Contributor II
  • 1 kudos

Is there a way to stop the session programmatically? like "kill session_id" or something similar in API?

  • 1 kudos
1 More Replies
Bency
by New Contributor III
  • 9365 Views
  • 6 replies
  • 4 kudos

Resolved! Databricks Delta Lake Sink Connector

I am trying to use Databricks Delta Lake Sink Connector(confluent cloud ) and write to S3 . the connector starts up with the following error . Any help on this could be appreciated org.apache.kafka.connect.errors.ConnectException: java.sql.SQLExcepti...

  • 9365 Views
  • 6 replies
  • 4 kudos
Latest Reply
Bency
New Contributor III
  • 4 kudos

Hi @Kaniz Fatma​  yes we did , looks like it was indeed a whitelisting issue . Thanks @Hubert Dudek​  @Kaniz Fatma​ 

  • 4 kudos
5 More Replies
Mike_Gardner
by New Contributor II
  • 2882 Views
  • 1 replies
  • 3 kudos

Resolved! Data Cache in Serverless SQL Endpoints vs Non-Serverless SQL Endpoints

Do Serverless SQL Endpoints benefit from Delta and Spark Cache? If so, does it differ from a non-serverless endpoints? How long does the cache last?

  • 2882 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 3 kudos

All SQL endpoints have delta cache enabled out of the box (in fact 2X-Small etc. are E8/16 etc. instances which are delta cache enabled). Delta cache is managed dynamically. So it stays till there is free RAM for that.

  • 3 kudos
tz1
by New Contributor III
  • 24437 Views
  • 7 replies
  • 6 kudos

Resolved! Problem with Databricks JDBC connection: Error occured while deserializing arrow data

I have a Java program like this to test out the Databricks JDBC connection with the Databricks JDBC driver. Connection connection = null; try { Class.forName(driver); connection = DriverManager.getConnection(url...

  • 24437 Views
  • 7 replies
  • 6 kudos
Latest Reply
Alice__Caterpil
New Contributor III
  • 6 kudos

Hi @Jose Gonzalez​ ,This similar issue in snowflake in JDBC is a good reference, I was able to get this to work in Java OpenJDK 17 by having this JVM option specified:--add-opens=java.base/java.nio=ALL-UNNAMEDAlthough I came across another issue with...

  • 6 kudos
6 More Replies
Constantine
by Contributor III
  • 3379 Views
  • 1 replies
  • 4 kudos

Resolved! How to process a large delta table with UDF ?

I have a delta table with about 300 billion rows. Now I am performing some operations on a column using UDF and creating another columnMy code is something like thisdef my_udf(data): return pass   udf_func = udf(my_udf, StringType()) data...

  • 3379 Views
  • 1 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 4 kudos

That udf code will run on driver so better not use it for such a big dataset. What you need is vectorized pandas udf https://docs.databricks.com/spark/latest/spark-sql/udf-python-pandas.html

  • 4 kudos
Labels