cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Paxi
by New Contributor
  • 232 Views
  • 0 replies
  • 0 kudos

Maven libs often failed during installation

Dear Community,I have a Databricks compute where I added 2 Maven libs using a custom repository from Nexus (because of a company policy, Databricks cannot communicate with the public internet, so I must use a private Nexus repo using a firewall). Sin...

  • 232 Views
  • 0 replies
  • 0 kudos
udi_azulay
by New Contributor II
  • 599 Views
  • 2 replies
  • 1 kudos

Variant type table within DLT

Hi,I have a table with Variant type (preview) and works well in 15.3, when i try to run a code that reference this Variant type in a DLT pipeline i get : com.databricks.sql.transaction.tahoe.DeltaUnsupportedTableFeatureException: [DELTA_UNSUPPORTED_F...

  • 599 Views
  • 2 replies
  • 1 kudos
Latest Reply
thomas-totter
New Contributor II
  • 1 kudos

Preview channel version currently is at 15.2. So we should be only one minor version increment away from variant being available in DLT (at least i hope so...).

  • 1 kudos
1 More Replies
koantek_user
by New Contributor
  • 375 Views
  • 1 replies
  • 0 kudos

geometric functions in databricks

Hi All,We are working on a migration project from snowflake to databricks and there are some scripts that utilizegeometric functions like st_makepoint, st_geohash from snowflake scripts which we need to convert to databricksHas some encountered this ...

  • 375 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @koantek_user ,Look at below example notebook: https://www.databricks.com/notebooks/geomesa-h3-notebook.htmlAnd more info on processing geospatial data in databricks you can find below: https://www.advancinganalytics.co.uk/blog/2023/1/3/gis-in-dat...

  • 0 kudos
raghunathr
by New Contributor III
  • 937 Views
  • 2 replies
  • 0 kudos

Service Account Access granted still getting as User does not have USE SCHEMA on Schema

Hi All, We have ran into scenario, where Azure Data Factory connecting to Azure Data Bricks through linkedServices, Where its trying to connect with System Assigned Managed Identity (SAMI). Specific SAMI added to compute and unity catalog for usage.s...

Data Engineering
azure_data_factory
azure_databricks
grants
permission_issue
unity_catlog
  • 937 Views
  • 2 replies
  • 0 kudos
Latest Reply
raghunathr
New Contributor III
  • 0 kudos

Still we have trouble on external_storage location now. That specific Managed Identity which added to Databricks Resource now got everything needed for Unity Catalog DEV/Tables. But, Even in External Location that SPN added but still getting error as...

  • 0 kudos
1 More Replies
mv-rs
by New Contributor
  • 411 Views
  • 0 replies
  • 0 kudos

Structured streaming not working with Serverless compute

Hi,I have a structured streaming process that is working with a normal compute but when attempting to run using Serverless, the pipeline is failing, and I'm being met with the error seen in the image below.CONTEXT: I have a Git repo with two folders,...

  • 411 Views
  • 0 replies
  • 0 kudos
mahfooziiitian
by New Contributor II
  • 936 Views
  • 3 replies
  • 0 kudos

get saved query by name using rest API or databricks SDK

Hi All,I want to get the saved query by name using rest API or databricks SDK. So It do not find any direct end point or method which can give us the saved query by name.I have one solution as given below:get the list all queriesfilter the my queries...

Data Engineering
python
REST API
Saved Queries
  • 936 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @mahfooziiitian ,The answer is no, currently you can get saved query only by id. If your are afraid of exceeding concurrent calls, then design a process that as a first step will use list queries endpoint to extract endpoint IDs and names and save...

  • 0 kudos
2 More Replies
gb_dbx
by New Contributor II
  • 488 Views
  • 3 replies
  • 2 kudos

Does Databricks plan to create a Python API of the COPY INTO spark SQL statement in the future ?

Hi,I am wondering if Databricks has planned to create a Python API of spark SQL's COPY INTO statement ?In my company we created some kind of a Python wrapper of the SQL COPY INTO statement, but it has lots of design issues and is hard to maintain. I ...

  • 488 Views
  • 3 replies
  • 2 kudos
Latest Reply
gb_dbx
New Contributor II
  • 2 kudos

Okay maybe I should take a look at Auto Loader then, I didn't know Auto Loader could basically do the same as COPY INTO, I originally thought it was only used for streaming and not batch ingestion.And Auto Loader has a dedicated Python API then ?And ...

  • 2 kudos
2 More Replies
biafch
by Contributor
  • 446 Views
  • 2 replies
  • 2 kudos

How to load a json file in pyspark with colon character in folder name

Hi,I have a folder that contains subfolders that have json files.My subfolders look like this:2024-08-12T09:34:37:452Z2024-08-12T09:25:45:185ZI attach these subfolder names to a variable called FolderName and then try to read my json file like this:d...

  • 446 Views
  • 2 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 2 kudos

Hi @biafch ,I've tried to replicate your example and it worked for me. But it seems that it is common problem and some object storage may not support that.[HADOOP-14217] Object Storage: support colon in object path - ASF JIRA (apache.org)Which object...

  • 2 kudos
1 More Replies
xhead
by New Contributor II
  • 16059 Views
  • 5 replies
  • 2 kudos

Does "databricks bundle deploy" clean up old files?

I'm looking at this page (Databricks Asset Bundles development work tasks) in the Databricks documentation.When repo assets are deployed to a databricks workspace, it is not clear if the "databricks bundle deploy" will remove files from the target wo...

Data Engineering
bundle
cli
deploy
  • 16059 Views
  • 5 replies
  • 2 kudos
Latest Reply
xhead
New Contributor II
  • 2 kudos

One further question:The purpose of “databricks bundle destroy” is to remove all previously-deployed jobs, pipelines, and artifacts that are defined in the bundle configuration files.Which bundle configuration files? The ones in the repo? Or are ther...

  • 2 kudos
4 More Replies
sarguido
by New Contributor II
  • 2902 Views
  • 5 replies
  • 2 kudos

Delta Live Tables: bulk import of historical data?

Hello! I'm very new to working with Delta Live Tables and I'm having some issues. I'm trying to import a large amount of historical data into DLT. However letting the DLT pipeline run forever doesn't work with the database we're trying to import from...

  • 2902 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Sarah Guido​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 2 kudos
4 More Replies
bulbur
by New Contributor II
  • 639 Views
  • 1 replies
  • 0 kudos

Use pandas in DLT pipeline

Hi,I am trying to work with pandas in a delta live table. I have created some example code: import pandas as pd import pyspark.sql.functions as F pdf = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "...

  • 639 Views
  • 1 replies
  • 0 kudos
Latest Reply
bulbur
New Contributor II
  • 0 kudos

I have taken the advice given by the documentation (However, you can include these functions outside of table or view function definitions because this code is run once during the graph initialization phase.) and moved the toPandas call to a function...

  • 0 kudos
Devsh_on_point
by New Contributor
  • 264 Views
  • 1 replies
  • 1 kudos

Liquid Clustering with Partitioning

Hi Team,Can we use Partitioning and Liquid Clustering in Conjunction? Essentially, partitioning the table first on a specific field and then apply liquid clustering (on other fields)?Alternatively, can we define the order priority of the cluster key ...

  • 264 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 1 kudos

Hi @Devsh_on_point ,No, you cant have partitioning and liquid clustering on a table. You can treat liquid clustering as a more performant replacement of partitioning.And yes, you are correct. Order of cluster columns doesn't matter:"Databricks recomm...

  • 1 kudos
Prashanth24
by New Contributor III
  • 1120 Views
  • 5 replies
  • 1 kudos

Resolved! Difference between Liquid clustering and Z-ordering

I am trying to understand the difference between Liquid clustering and z-ordering. As per my understanding, both stores the clustered information into ZCubes which is of size 100 GB.Liquid Clustering maintains ZCube id in transaction log so when opti...

  • 1120 Views
  • 5 replies
  • 1 kudos
Latest Reply
Brahmareddy
Valued Contributor II
  • 1 kudos

Hi Prashanth,Liquid Clustering only reorganizes parts of the data that aren't already clustered to make it more efficient. Z-Ordering, on the other hand, reorganizes the entire table or partitions every time, which is more resource-intensive.

  • 1 kudos
4 More Replies
vannipart
by New Contributor III
  • 494 Views
  • 1 replies
  • 1 kudos

Resolved! SparkOutOfMemoryError when merging data into a table that already has data

Hello, There is an issue with merging data from a dataframe into a table 2024 databricksJob aborted due to stage failure: Task 17 in stage 1770.0 failed 4 times, most recent failure: Lost task 17.3 in stage 1770.0 (TID 1669) (1x.xx.xx.xx executor 8):...

  • 494 Views
  • 1 replies
  • 1 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 1 kudos

This widget could not be displayed.
Hello, There is an issue with merging data from a dataframe into a table 2024 databricksJob aborted due to stage failure: Task 17 in stage 1770.0 failed 4 times, most recent failure: Lost task 17.3 in stage 1770.0 (TID 1669) (1x.xx.xx.xx executor 8):...

This widget could not be displayed.
  • 1 kudos
This widget could not be displayed.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels