cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vgautam
by New Contributor
  • 130 Views
  • 1 replies
  • 0 kudos

Differentiate null values in Variant Data type

Hello, Based on the documentation here, in both scenarios below try_variant_get returns a null: If the object cannot be foundif the object cannot be cast How does one differentiate between the two scenarios? 

  • 130 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @vgautam, In the try_variant_get function, NULL is returned in two scenarios: Object Not Found: If the specified path does not exist in the JSON object.Invalid Cast: If the object at the specified path cannot be cast to the target type. To differe...

  • 0 kudos
PiotrM
by New Contributor II
  • 195 Views
  • 2 replies
  • 0 kudos

Canceling long running on UC-enabled all-purpose clusters

Hey, as in the subject. Is it possible to set timeout for long running queries on all-purpose clusters that are UC enabled? I know there is such setting for SQL Warehouses and Workflows, but I was unable to find one for all-purpose clusters. The issu...

  • 195 Views
  • 2 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@PiotrM thanks for your question! Adding to @Alberto_Umana comment, could you please clarify what do you mean with: "I tried thing like spark.task.reaper.killTimeout, but it seems like UC clusters won't accept it." ? Is it throwing an error or is it ...

  • 0 kudos
1 More Replies
berserkersap
by Contributor
  • 4729 Views
  • 4 replies
  • 1 kudos

Speed Up JDBC Write from Databricks Notebook to MS SQL Server

Hello Everyone,I have a use case where I need to write a delta table from DataBricks to a SQL Server Table using Pyspark/ python/ spark SQL .The delta table I am writing contains around 3 million records and the SQL Server Table is neither partitione...

Data Engineering
JDBC
MS SQL Server
pyspark
Table Write
  • 4729 Views
  • 4 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

@berserkersap have you had time to identify where's the bottleneck? e.g.: sequential writes, network latency/throughput, or maybe you have a connection pool in the target much lower than the number of connection threads in the source?

  • 1 kudos
3 More Replies
vivek_cloudde
by New Contributor II
  • 391 Views
  • 8 replies
  • 2 kudos

Resolved! Issue while creating on-demand cluster in azure databricks using pyspark

Hello,I am trying to create an on demand cluster in azure databricks using below code and i am getting the error message{"error_code":"INVALID_PARAMETER_VALUE","message":"Exactly 1 of virtual_cluster_size, num_workers or autoscale must be specified."...

  • 391 Views
  • 8 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

@vivek_cloudde I still find it interesting to know that for all these different misconfigurations or wrong cluster definitions, you got the same error message, but anyways, happy to hear it worked ! If it helps, next time and to make things simpler, ...

  • 2 kudos
7 More Replies
nikhil_kumawat
by New Contributor II
  • 334 Views
  • 8 replies
  • 2 kudos

Not able to retain precision while reading data from source file

Hi, I am trying to read a csv file located in S3 bucket folder. The csv file contains around 50 columns out of which one of the column is "litre_val" which contains values like "60211.952", "59164.608'. Upto 3 decimal points. Now to read this csv we ...

precision.png
  • 334 Views
  • 8 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

@nikhil_kumawat can you provide more details to reproduce this and better help you? e.g.: sample data set, dbr version, reproducer code, etc. I'm having this sample data: csv_content = """column1,column2,litre_val,another_decimal_column 1,TypeA,60211...

  • 2 kudos
7 More Replies
maxutil
by New Contributor II
  • 16164 Views
  • 4 replies
  • 3 kudos

Invalid Characters in Column Names " ,;{}()\n\t="

I'm reading data into a dataframe withdf = spark.read.json("s3://somepath/")I've tried first creating a delta table using the DeltaTable API with:DeltaTable.createIfNotExists(spark)\ .location(target_path)\ .addColumns(df.sche...

  • 16164 Views
  • 4 replies
  • 3 kudos
Latest Reply
VZLA
Databricks Employee
  • 3 kudos

@jb1z @maxutil Have you tried it like this?   import dlt @dlt.table(table_properties={'quality': 'bronze', 'delta.columnMapping.mode': 'name'}) def netsuite_items_inventory_price(): return ( spark.readStream.format('cloudFiles') ...

  • 3 kudos
3 More Replies
Algocrat
by New Contributor II
  • 2933 Views
  • 2 replies
  • 2 kudos

Resolved! Discover and redact pii

Hi! What is the best way to discover and redact pii. Does Databricks offer any frameworks, or set of methods, or processes that we may follow?  

  • 2933 Views
  • 2 replies
  • 2 kudos
Latest Reply
viswesh
New Contributor II
  • 2 kudos

Hey @Algocrat  @szymon_dybczak , just wanted to let you know that Databricks is currently working on a product to tackle PII / sensitive data classification. If you're a current customer, we recommend you reach out to your account representative to l...

  • 2 kudos
1 More Replies
Shivaprasad
by New Contributor III
  • 355 Views
  • 10 replies
  • 3 kudos

Accessing delta tables using API outside azure (Workiva)

I need to access delta tables with API outside azure using in a reporting tool workiva with using the connector. Can someone able to provide the details on how I can achieve it

  • 355 Views
  • 10 replies
  • 3 kudos
Latest Reply
RiyazAli
Valued Contributor II
  • 3 kudos

Hello @Shivaprasad Expanding on the code snippet provided above. Once you run the Rest Api code provided above, you will get the statement id. Use this statement_id to get the query results using the same statements api.code snippet is as below:impor...

  • 3 kudos
9 More Replies
semsim
by Contributor
  • 2421 Views
  • 6 replies
  • 0 kudos

Resolved! Installing LibreOffice on Databricks

Hi, I need to install libreoffice to do a document conversion from .docx to .pdf. The requirement is no use of containers. Any idea on how I should go about this? Environment: Databricks 13.3 LTSThanks,Sem

  • 2421 Views
  • 6 replies
  • 0 kudos
Latest Reply
furkan
New Contributor II
  • 0 kudos

Hi @semsim I'm attempting to install LibreOffice for converting DOCX files to PDF and tried running your shell commands from notebook. However, I encountered the 404 errors shown below. Do you have any suggestions on how to resolve this issue? I real...

  • 0 kudos
5 More Replies
ChsAIkrishna
by New Contributor III
  • 225 Views
  • 9 replies
  • 0 kudos

Databricks SQL Warehouse Querys went to orphan state

We're experiencing an issue with our Databricks dbt workflow and workflow job is using the SQL warehouse L size cluster that's been working smoothly for the past couple of weeks. However, today we've noticed that at a specific time, all queries are g...

  • 225 Views
  • 9 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Thanks, MSFT might reach out to us and we will provide assistance

  • 0 kudos
8 More Replies
soumiknow
by Contributor
  • 451 Views
  • 10 replies
  • 2 kudos

Resolved! How to resolved 'connection refused' error while using a google-cloud lib in Databricks Notebook?

I want to use google-cloud-bigquery library in my PySpark code though I know that spark-bigquery-connector is available. The reason I want to use is that the Databricks Cluster 15.4LTS comes with 0.22.2-SNAPSHOT version of spark-bigquery-connector wh...

  • 451 Views
  • 10 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

@soumiknow sounds good ! Please let me know if you need some internal assistance with the communication process.

  • 2 kudos
9 More Replies
ibrahim21124
by New Contributor III
  • 3496 Views
  • 3 replies
  • 1 kudos

Resolved! Databricks Job Timeout after 20 minutes

Hello,I have created a job with no timeout-seconds provided. But I am getting Error: Timed out within 20 minutes. I am running the below commands using Bash@3 task in ADO Pipeline yaml file. The code for the same is given belowtask: Bash@3  timeoutIn...

  • 3496 Views
  • 3 replies
  • 1 kudos
Edthehead
by Contributor II
  • 247 Views
  • 1 replies
  • 0 kudos

Restoring a table from a Delta live pipeline

I have a DLT pipeline running to ingest files from storage using autoloader. We have a bronze table and a Silver table.A question came up from the team on how to restore DLT tables to a previous version in case of some incorrect transformation. When ...

  • 247 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

  The RESTORE command is not supported on streaming tables, which is why you encountered the error. Instead, you can use the TIME TRAVEL feature of Delta Lake to query previous versions of the table. You can use the VERSION AS OF or TIMESTAMP AS OF c...

  • 0 kudos
Abishrp
by New Contributor III
  • 381 Views
  • 7 replies
  • 3 kudos

Resolved! Issue in getting list of pricing details in json

I can view the pricing details using databricks pricing calculator. Can i able to get pricing details in the form of json or APIS available to get pricing details? I particularly need how much the instance dbu rate for per hour.

  • 381 Views
  • 7 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @Abishrp ,Unfortunately, as of now, Databricks does not provide a dedicated public API to directly retrieve pricing information in JSON format. (or to be precise, Azure Pricing Calulator doesn't have such option)

  • 3 kudos
6 More Replies
QueryingQuagga
by New Contributor III
  • 576 Views
  • 7 replies
  • 4 kudos

Resolved! Working with semi-structured data (complex - variant)

Edit: value of inner key "value" was an array - I have added the square brackets to the example below.Hello all,I'm working with Spark SQL API for querying semi-structured data in Databricks. Currently I'm having a hard time understanding how I can n...

Data Engineering
Complex datatypes
Databricks SQL Warehouse
spark sql
Variant datatype
  • 576 Views
  • 7 replies
  • 4 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 4 kudos

Hi @QueryingQuagga ,Maybe something like that?: %sql WITH src AS ( SELECT parse_json('{ "extendedinformation":[ { "name": "CHANNEL", "value": [{\"id\":\"DUMMYID1\",\"name\":\"DUMMYCHANNEL1\",\"role\":\"DUMMYROLE1\"}]}, ...

  • 4 kudos
6 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels